Recovering dropped pronouns from Chinese text messages

نویسندگان

  • Yaqin Yang
  • Yalin Liu
  • Nianwen Xue
چکیده

Pronouns are frequently dropped in Chinese sentences, especially in informal data such as text messages. In this work we propose a solution to recover dropped pronouns in SMS data. We manually annotate dropped pronouns in 684 SMS files and apply machine learning algorithms to recover them, leveraging lexical, contextual and syntactic information as features. We believe this is the first work on recovering dropped pronouns in Chinese text messages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dropped personal pronoun recovery in Chinese SMS

In written Chinese, personal pronouns are commonly dropped when they can be inferred from context. This practice is particularly common in informal genres like Short Message Service (SMS) messages sent via cell phones. Restoring dropped personal pronouns can be a useful preprocessing step for information extraction. Dropped personal pronoun recovery can be divided into two subtasks: (1) detecti...

متن کامل

Annotating dropped pronouns in Chinese newswire text

We propose an annotation framework to explicitly identify dropped subject pronouns in Chinese. We acknowledge and specify 10 concrete pronouns that exist as words in Chinese and 4 abstract pronouns that do not correspond to Chinese words, but that are recognized conceptually, to native Chinese speakers. These abstract pronouns are identified as “unspecified”, “pleonastic”, “event”, and “existen...

متن کامل

Neural Recovery Machine for Chinese Dropped Pronoun

Dropped pronouns (DPs) are ubiquitous in pro-drop languages like Chinese, Japanese etc. Previous work mainly focused on painstakingly exploring the empirical features for DPs recovery. In this paper, we propose a neural recovery machine (NRM) to model and recover DPs in Chinese, so that to avoid the non-trivial feature engineering process. The experimental results show that the proposed NRM sig...

متن کامل

A Novel Approach to Dropped Pronoun Translation

Dropped Pronouns (DP) in which pronouns are frequently dropped in the source language but should be retained in the target language are challenge in machine translation. In response to this problem, we propose a semisupervised approach to recall possibly missing pronouns in the translation. Firstly, we build training data for DP generation in which the DPs are automatically labelled according t...

متن کامل

Young driver distraction by text messaging: A comparison of the effects of reading and typing text messages in Chinese versus English

Background: Reading and typing text messages while driving seriously impairs driving performance and are prohibited activities in many jurisdictions. Hong Kong is a bilingual society and many people write in both Chinese and English. As the input methods for text messaging in Chinese and English are considerably different, this study used a driving simulator approach to compare the effects of r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015